Vocal Tract Length Normalization for Speaker Independent Acoustic-to-Articulatory Speech Inversion

نویسندگان

  • Ganesh Sivaraman
  • Vikramjit Mitra
  • Hosung Nam
  • Mark K. Tiede
  • Carol Y. Espy-Wilson
چکیده

Speech inversion is a well-known ill-posed problem and addition of speaker differences typically makes it even harder. This paper investigates a vocal tract length normalization (VTLN) technique to transform the acoustic space of different speakers to a target speaker space such that speaker specific details are minimized. The speaker normalized features are then used to train a feed-forward neural network based acoustic-toarticulatory speech inversion system. The acoustic features are parameterized as time-contextualized mel-frequency cepstral coefficients and the articulatory features are represented by six tract-variable (TV) trajectories. Experiments are performed with ten speakers from the U. Wisc. X-ray microbeam database. Speaker dependent speech inversion systems are trained for each speaker as baselines to compare the performance of the speaker independent approach. For each target speaker, data from the remaining nine speakers are transformed using the proposed approach and the transformed features are used to train a speech inversion system. The performances of the individual systems are compared using the correlation between the estimated and the actual TVs on the target speaker’s test set. Results show that the proposed speaker normalization approach provides a 7% absolute improvement in correlation as compared to the system where speaker normalization was not performed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Acoustic-to-Articulatory Speech Inversion Across Different Accents and Languages

The focus of this paper is estimating articulatory movements of the tongue and lips from acoustic speech data. While there are several potential applications of such a method in speech therapy and pronunciation training, performance of such acoustic-to-articulatory inversion systems is not very high due to limited availability of simultaneous acoustic and articulatory data, substantial speaker ...

متن کامل

On the Use of a Wave-Reflection Model for the Estimation of Spectral Effects due to Vocal Tract Length Changes with Application to Automatic Speech Recognition

Vocal tract length normalization (VTLN) is commonly used in state-of-the-art automatic speech recognition (ASR) systems to reduce the mismatch between speaker-dependent formant frequency scalings. Usually, the normalization is done by a piece-wise linear scaling of the filter bank center frequencies. The linear scaling is motivated by a uniform acoustic tube model that does not take any loss ef...

متن کامل

[inria-00544363, v1] Automatic adaptation of a vocal tract model

In this paper we present a method for adapting an articulatory model to a new speaker from acoustic data only. The main goal of this method is to make acoustic-to-articulatory inversion a fully automatic process. Speaker-specificity is modeled by a two dimensional scale factor, which makes it more flexible than VTLN methods. Validation of the method is performed on three speakers by comparing t...

متن کامل

Recognizing Dysarthric Speech due to Amyotrophic Lateral Sclerosis with Across-Speaker Articulatory Normalization

Recent dysarthric speech recognition studies using mixed data from a collection of neurological diseases suggested articulatory data can help to improve the speech recognition performance. This project was specifically designed for the speakerindependent recognition of dysarthric speech due to amyotrophic lateral sclerosis (ALS) using articulatory data. In this paper, we investigated three acro...

متن کامل

Unsupervised vocal-tract length estimation through model-based acoustic-to-articulatory inversion

Knowledge of vocal-tract (VT) length is a logical prerequisite for acoustic-to-articulatory inversion. Prior work has treated VT length estimation (VTLE) and inversion largely as separate problems. We describe a new algorithm for VTLE based on acoustic-to-articulatory inversion. Our inversion process uses the Maeda model (MM, [1,2]) and combines global search [3] and dynamic programming for tra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016